Search Result

Select

Chinese speech segmentation method based on Gauss distribution of time spans of syllables

ZHANG Yang, ZHAO Xiaoqun, WANG Digang

Journal of Computer Applications 2016, 36 (5): 1410-1414. DOI: 10.11772/j.issn.1001-9081.2016.05.1410

Abstract （676）

PDF （957KB）（349）

Save

So far away, there is no accurate method for Chinese natural speech segmentation of syllables,which is meaningful in labeling speech with reference text instead of people. According to two hypotheses that time spans of Chinese syllables under the same pronunciation obey Gauss distribution and short-time energy valley exists between two adjacent syllables, Chinese speech segmentation method based on Gauss distribution of time spans of syllables was proposed. A simplified method based on distribution of energy valleys was given, which effectively reduced the time complexity of this speech segmentation method. The experimental results show that segmentation accuracy (mean square value of time spans between artificial labels and labels created by this method) achieve 10 ^-3 and computing times are less than 1 s in Matlab of PC.

Reference | Related Articles | Metrics

Select

Unvoiced/voiced mode codebook design algorithm based on cellular evenness

XU Jingyun, ZHAO Xiaoqun, CAI Zhiduan, WANG Peiliang

Journal of Computer Applications 2016, 36 (12): 3374-3377. DOI: 10.11772/j.issn.1001-9081.2016.12.3374

Abstract （521）

PDF （589KB）（317）

Save

The parameter distribution of unvoiced/voiced Line Spectrum Frequency (LSF) has differences. In order to improve the quantization performance of LSF parameters in vocoder, an unvoiced/voiced mode codebook design algorithm based on Cell Evenness (CE) was presented by using the difference between unvoiced/voiced LSF parameters distribution and CE. Firstly, the optimal amount ratio of unvoiced/voiced LSF parameters participating in the codebook training was deduced according to CE. Then the specified number of atypia LSF parameters were eliminated from unvoiced speech. The final codebook was retrained. The experimental results show that, compared with the shared codebook algorithm under the same bit-rate condition, the average spectrum distortion of the proposed algorithm was reduced by 2.5%, the mean opinion score was increased by 2.3% and the storage of codebook was reduced by 21.1%. The proposed algorithm is also adapted to the vocoder without unvoiced/voiced symbol transmission and the algorithm is also adapted to the vocoder without unvoiced/voiced symbol transmission.

Reference | Related Articles | Metrics

Select

Chinese speech segmentation into syllables based on energies in different times and frequencies

ZHANG Yang, ZHAO Xiaoqun, WANG Digang

Journal of Computer Applications 2016, 36 (11): 3222-3228. DOI: 10.11772/j.issn.1001-9081.2016.11.3222

Abstract （609）

PDF （1015KB）（478）

Save

Precise speech segmentation methods, which can also greatly improve the efficiency of corpus annotation works, are helpful in comparing voice with voice models in speech recognition. A new Chinese speech segmentation into syllables based on the feature of time-frequency-dimensional energy was proposed:firstly, silence frames were searched in traditional way; secondly, unvoiced frames were sought using the difference of energies in different frequencies; thirdly, the voiced frames and speech frames were looked for with the help of 0-1 energies in special frequency ranges; finally, syllable positions were given depending on the judgements above. The experimental results show that the proposed method whose syllable error is 0.0297 s and syllable deviation is 7.93% is superior to Merging-Based Syllable Detection Automaton (MBSDA) and method of Gauss fitting.

Reference | Related Articles | Metrics